Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Dependency Update] CUDA10.1 Support #14887

Merged
merged 3 commits into from
May 21, 2019
Merged

Conversation

stu1130
Copy link
Contributor

@stu1130 stu1130 commented May 6, 2019

Description

Upgrade the CUDA 10.1 with latest cuDNN 7.5.1 & NCCL 2.4.2

Checklist

Run three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
Performance shown below
Environment: P3.16xlarge Deep Learning Base AMI
Codebase: commit d87bd2a
The unit of thoughput is samples/per second
Each throughput is calcuated by average of 5 runs

ResNet

model: Resnet50
dataset: Imagenet
number of gpu: 8
epochs: 90 (since the regression we found recently only have significant impact on large epochs)
preprocess command: sudo pip install gluoncv==0.2.0b20180625
command: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 128 --dtype float32 --num-data-workers 40 --num-epochs 3 --gpus 0,1,2,3,4,5,6,7 --lr 0.05 --last-gamma --mode symbolic --model resnet50_v1b --rec-train /home/ubuntu/data/train-passthrough.rec --rec-train-idx /home/ubuntu/data/train-passthrough.idx --rec-val /home/ubuntu/data/val-passthrough.rec --rec-val-idx /home/ubuntu/data/val-passthrough.idx
github repo: /~https://github.com/rahul003/deep-learning-benchmark-mirror.git

Throughput CUDA10.1 cuDNN 7.5.1/NCCL 2.4.2 CUDA 10 cuDNN 7.5.1/NCCL 2.4.2 Perforamnce Difference
with MKLDNN 2817.18815 2791.17069 0.932%
without MKLDNN 2821.88889 2798.57505 0.833%

LSTM

model: LSTM
dataset: PTB(Penn Treebank)
number of gpu: 1
epochs: 10
command:
python2 benchmark_driver.py --framework mxnet --task-name mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore local
python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 —gpus 0 --epochs 10 --kv-store local

Throughput CUDA10.1 cuDNN 7.5.1/NCCL 2.4.2 CUDA 10 cuDNN 7.5.1/NCCL 2.4.2 Perforamnce Difference
with MKLDNN 1015.61785 869.05555(Performance Regression) 16.865%
without MKLDNN 1015.01455 830.68338(Performance Regression) 22.190%

The CUDA 10 have a performance regression issue, please see #14725 to find more details.

MLP

I changed the MLP script so the performance might be a liitle worse than before
model: 3 dense layers with num_hidden=64 and relu as activation
dataset: MNIST
number of gpu: 1
epochs: 10
command:
python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' —data-set mnist

Throughput CUDA10.1 cuDNN 7.5.1/NCCL 2.4.2 CUDA 10 cuDNN 7.5.1/NCCL 2.4.2 Perforamnce Difference
with MKLDNN 4422.72478 4403.72596 0.431%
without MKLDNN 4342.19752 4329.25058 0.299%

Comments

@szha @lanking520 @eric-haibin-lin

@stu1130 stu1130 requested a review from szha as a code owner May 6, 2019 04:54
@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label May 6, 2019
@stu1130 stu1130 changed the title [Dependency Update] CUDA10.1 Support [WIP][Dependency Update] CUDA10.1 Support May 6, 2019
@lanking520
Copy link
Member

General thoughts, do you think it is nessary for us to have some real-time benchmarking on the performance once we do some upgrade like this?

@stu1130 stu1130 changed the title [WIP][Dependency Update] CUDA10.1 Support [Dependency Update] CUDA10.1 Support May 8, 2019
@stu1130
Copy link
Contributor Author

stu1130 commented May 8, 2019

@lanking520 Sorry for late response. real-time benchmarking here means test performance in CI System? I would prefer running the bechmark across CUDA 9/9.2/10/10.1 on nightly build

@lanking520
Copy link
Member

@perdasilva I saw a recent commit done by you to downgrade the CUDA version. Do you think this number is promising and we push it to master?

Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM since the performance number seemed to be promising.

@stu1130 stu1130 force-pushed the publish_cuda10_1 branch from e39ac04 to dbf6041 Compare May 16, 2019 18:08
@szha szha merged commit 7b48c24 into apache:master May 21, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
[Dependency Update] CUDA10.1 Support
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants